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Exploring molecular mechanisms underlying bacterial water-to-land transition represents a critical 
start toward a better understanding of the functioning and stability of the terrestrial ecosystems. 
Here, we perform comprehensive analyses based on a large variety of bacteria by integrating 
taxonomic, phylogenetic and metagenomic data, in the quest for a unified view that elucidates 
genomic, evolutionary and ecological dynamics of the marine progenitors in adapting to nonaquatic 
environments. We hypothesize that bacterial land colonization is dominated by a single-gene sweep, 
that is, the emergence of dnaE2 derived from an early duplication event of the primordial dnaE, 
followed by a series of niche-specific genomic adaptations, including GC content increase, intensive 
horizontal gene transfer and constant genome expansion. In addition, early bacterial radiation may 
be stimulated by an explosion of land-borne hosts (for example, plants and animals) after initial 
land colonization events. 

The ISME Journal (2014) 8, 1358-1369; doi:10.1038/ismej.201 3.247; published online 23 January 2014 
Subject Category: Microbial population and community ecology 

Keywords: adaptive mutagenesis; bacterial land colonization; GC content; genome expansion; HGT; 
metagenomics 



Introduction 

Terrestrial ecosystems must have pressured the 
primordial microbial life with diverse and less 
movable microhabitats, scarce resources and other 
environmental hazards (for example, turbulence 
of pH and temperature, ultraviolet radiation and 
desiccation). Soil bacteria represent the majority of 
biodiversity in terrestrial ecosystems and are essen- 
tially involved in the establishment and evolution of 
primary elements in these ecosystems, such as 
carbon sequestration, nitrogen fixation and element 
cycling (Vogel et al, 2009; Madsen, 2011; He et al, 
2012; Sanford et al, 2012; Yergeau et al, 2012). 
Therefore, bacterial land colonization is arguably 
not only one of the most challenging, but also one of 
the most fundamental and seminal ecological transi- 
tions in bacterial evolution. The conquest of the 
Earth's solid surface must have entailed significant 
genomic changes and genetic innovations in the 
overall architecture, flexible configuration and 
necessary elements of bacterial chromosomes to 
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cope with this new environment. For instance, soil 
microbes are intensively reported to possess high 
metabolic versatility, such as metal-reducing ability 
(Venkateswaran et al., 1998), abundance of nitrous 
oxide reductase (Sanford et al, 2012) and antibiotic- 
resistant genes (Riesenfeld et al., 2004). In addition, 
soil-borne bacteria have been found to have the 
highest number of associations with diverse 
hosts as compared with other environment-dwelling 
bacteria (Hooper et al., 2009) and even the most 
co-occurring partners within soil environment itself 
(Freilich et al., 2010), together making terrestrial 
environment the most important yet complicated 
system. 

The next-generation sequencing technology has 
made the bulk generation of pangenomic and 
metagenomic data sets possible for studying phylo- 
genetic and taxonomic biogeography of soil micro- 
bial communities (Fierer and Jackson, 2006; Fierer 
et al., 2012a, b). Currently, major efforts have been 
focused on investigating environment-specific genes 
involved in various metabolic pathways (Sanford 
et al, 2012; Barret et al, 2013) that are hypothesized 
to be responsible for bacterial niche-specific adapta- 
tions (Hacker and Carniel, 2001; Konstantinidis 
and Tiedje, 2005; Shapiro et al., 2009; Coleman 
and Chisholm, 2010). However, such investigations 
may only give us a glimpse of the emergence of some 
specific metabolic features under limited nutrients 
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or resources instead of providing global pictures of 
bacterial evolution. Taking bacterial land coloniza- 
tion as an example, there has been limited progress 
in revealing how bacterial communities have 
conquered the land, survived in the new and harsh 
environments and formed unique patterns of taxo- 
nomic diversity distinct from other communities, 
such as the aquatic and the host associated 
(Madigan et aL, 2011). Therefore, identification 
of adaptive and diversifying evolutionary processes 
in association with the water-to-land transition is of 
critical importance in systematically understanding 
bacterial radiation, host-pathogen coevolution and 
ecosystem stability. 

One of our recent studies has clarified the 
relationship between error-prone DNA synthesis 
and GC content variations, showing that a paralog 
of replicative DnaE polymerase — DnaE2 — is respon- 
sible for bacterial GC increase, whereas other 
mutator genes only play fine-tuning roles in this 
process, and that DnaE 2 -containing bacteria are 
found to be specifically enriched in soil environ- 
ments (Wu et aL, 2012). We conceive that DnaE2 
may play a major role in the bacterial water-to-land 
transition, as it is more prevalent in bacterial species 
found in terrestrial than aquatic environments. 
To test this hypothesis, here we perform 
comprehensive comparative analyses based on a 
large quantity of bacteria by combining evidence 
from taxonomic, metagenomic and phylogenetic 
analyses. We also provide a unified view on 
bacterial land colonization by invoking GC content 
variation and genome size expansion. 



Materials and methods 

Taxonomic structure analysis 

The taxonomic structure of all Proteobacteria is 
estimated from the National Center for Biotechno- 
logy Information (NCBI) taxonomy website 
(http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/ 
wwwtax.cgi?id=1224). Only bacteria with available 
full genomic sequences are included in this 
analysis, that is, 1620 in total (on 19 March 2013) 
after the exclusion of Epsilon-proteobacteria. 
An alternative subset of Proteobacteria with detailed 
annotation of bacterial habitats and DnaE2 poly- 
merases is collected from our previous study (195 
genomes in total and 84 are DnaE 2 -containing 
bacteria) (Wu et aL, 2012) and also recruited for 
taxonomic analysis. The data set used for estimating 
taxonomic structure of soil bacteria is obtained from 
a previous collection (Madigan et aL, 2011). The 
presence/absence of DnaE2 in 98 randomly selected 
terrestrial and aquatic bacteria, which are grouped 
based on both taxonomic positions and lifestyles, is 
further reannotated (Supplementary Table Si). The 
detailed metadata are retrieved from Genomes 
Online Database (http://www.genomesonline.org/; 
Pagani et aL, 2012). 
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Detection of DnaE 2 in metagenomic data sets 
The metagenomic data for the Sargasso Sea are 
collected from a previous publication (Venter et aL, 
2004). The newly released metagenomic data from 
Peru Margin sediment are also collected (Orsi et aL, 
2013). All other metagenomic data are from 
JGI metagenomics program (http://genome.jgi. 
doe.gov/programs/metagenomes/index.jsf). We collect 
7 334298 protein sequences from aquatic environ- 
ment (including marine and fresh water) and 
14 303 396 protein sequences from soil environment 
for the detection of DnaE polymerases. The detailed 
identification procedures are described in 
(Supplementary Figure Si). We first extract all 
peptide sequences annotated as 'DNA polymerase 
III alpha subunit' (DnaE/PolC polymerases) with 
length of > 100 amino acids and then recruit these 
candidate DnaE/PolC sequences for further HMM 
(Hidden Markov Model) profiling using HMMER 
(version 3.0) (Finn et aL, 2011). Three different 
DnaE/PolC HMM profiling matrices (from DnaEl + 
DnaE3, DnaE2 and PolC, respectively) are built 
(hmmbuild) based on our previous curation (Wu 
et aL, 2012). These three HMM matrices are then 
used to search each candidate DnaE polymerase 
extracted from the metagenomic data sets 
(hmmsearch) ending with three different HMM 
profiling scores and relative E- values. The scores 
reflecting the identity of the candidate sequence 
with each of the three HMM profiling matrices are 
then compared for further classification. The candi- 
date sequence is identified as DnaE2 only when its 
identity with DnaE 2 HMM matrix is the largest and 
meantime 1.5 times larger than that with the 
DnaEl + DnaE 3 HMM matrix. The DnaEl + DnaE 3 
polymerase is also identified in the same way (as 
long as the score with DnaEl +DnaE3 is the largest 
and also 1.5 times larger than the score with the 
DnaE 2 HMM matrix), whereas a polymerase is 
identified as PolC as long as it has the best similarity 
score with PolC HMM matrix, as PolC is very 
distinct from other polymerases. All other poly- 
merases are grouped as unclassified for better data 
quality and excluded from further analysis. Three 
mutator genes [mutT, mutY and mutM) that partici- 
pate in DNA repair and contribute to GC content 
variation (loss of muff increases GC, whereas loss of 
m utY/M increases AT) (Garcia-Gonzalez et aL, 2012; 
Wu et aL, 2012) are also collected in order to 
examine their relative abundances in terrestrial 
versus aquatic environment. As these repair genes 
tend to have shorter lengths (encoding ~200 amino 
acids), only sequences with HMM scores ^50 and 
E'-value ^le — 10 are extracted for further analysis. 
Besides, DnaE sequences in 12 freshwater samples 
and 20 FACE (Free-Air Carbon Dioxide Enrichment) 
soil samples are used for sequencing saturation 
analysis and the relative abundance of DnaE 2 in six 
samples of different depths (5, 30, 50, 70, 91, and 
159 m) from Peru Margin marine sediment are also 
examined. The proportion of DnaE2 (DnaE2%) is 
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calculated as follows (based on the fact that 
DnaEl +DnaE3 sequences are generally single cop- 
ied and thus their numbers can roughly reflect the 
total number of bacteria): 



DnaE2 % 



No. of DnaE2 



No. of DnaEl +DnaE3 



Phylogenetic tree construction 

We build all phylogenetic trees using MEGA 5.05 
(Tamura et aL, 2011) under JTT + T4 model. The 
phylogeny of 137 DnaE2 sequences is built using 
neighbor-joining method with 100 bootstraps. 
A total of 10 random selected DnaEl sequences 
from Actinobacteria is used as the outgroup. The 
main topology of DnaE2 tree is also validated by a 
more comprehensive data set of outgroup including 
25 DnaEl sequences from five different taxonomic 
groups using both neighbor joining (Supplementary 
Figure S2A) and more robust maximum likelihood 
methods (Supplementary Figure S2B). Four DnaE2 
sequences (one from Nitrospirae, one from Verruco- 
microbia and two from Delta Proteobacteria) 
identified previously by HMM profiling method 
are excluded for further analysis as they are 
phylogenetically classified as DnaEl (Supple- 
mentary Figure S2). The phylogenies of three case 
studies referring to Chlamydiae-Verrucomicrobia, 
Azospirillum [Alpha Proteobacteria) and Beta 
Proteobacteria are built by DnaEl sequences using 
neighbor-joining method with 500 bootstraps and 
visualized with the help of the online iTOL tool 
(Letunic and Bork, 2011). 

Results 

Evidence from the taxonomic structure 
Diverse soil bacterial communities are often 
characterized by the dominance of Proteobacteria 
(mainly within Gamma, Beta and Alpha class), 
Actinobacteria, Acidobacteria, Planctomycetes and 
Verrucomicrobia (Dedysh et aL, 2006; Zhou et aL, 
2009; Bergmann et aL, 2011; Montana et aL, 2012). 
Community-based analyses in paleosols also con- 
firm this distinct taxonomic distribution (Chandler 
et aL, 1998; Hart et aL, 2011). Coincidently, we 
notice that these dominant bacterial phyla in soil 
are, intriguingly, also DnaE2 bearing, with very few 
exceptions (Table 1). For further validation, we 
compare the taxonomic structure of DnaE2-containg 
Proteobacteria with that of terrestrial Proteobacteria 
(given that Proteobacteria contain the most available 
genome sequences representing an unbiased 
sampling). Our result demonstrates that these 
two groups of bacteria have very similar taxo- 
nomic structure — both underrepresentation 
of Gamma-proteobacteria but overrepresentation 
of Beta-proteobacteria, whereas the proportions of 
Alpha- and Delta-proteobacteria are not much 



Table 1 Number of DnaE2 -containing bacteria in each phylum 
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Figure 1 Taxonomic structure of DnaE 2 -containing and soil- 
dwelling bacteria. T_sub' stands for a subset of Proteobacteria 
collected from our previous study (Wu et aL, 2012) that in general 
reflects the taxonomy structure of all sequenced Proteobacteria in 
the NCBI database (P_NCBI). The data set for the 'Soil' bacteria is 
from Madigan et al. (2011). £ DnaE2' includes all DnaE2-contain- 
ing bacteria in T_sub'. 



variable as compared with the entire Proteobacteria 
population (Figure 1). Interestingly, Epsilon-proteo- 
bacteria are undetected or nearly absent in either 
data sets. The similar community structure between 
DnaE 2 -bearing and soil-dwelling bacteria strongly 
indicates that the appearance of dnaE2 plays an 
important role in shaping the biogeographic pattern 
of terrestrial bacteria. Moreover, the presence of 
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DnaE2 in most bacteria is associated with terrestrial 
environment, whereas the absence of DnaE2 is 
found to be common in aquatic bacteria even when 
comparing within the same phylum (Supplementary 
Table Si), suggesting that it is DnaE2 not taxonomy 
that is linked to environmental adaptations. 



Evidence from metagenomic study 
We further explore the abundance of DnaE2 using 
the abundant metagenomic data, as metagenomes 
enable systematic understanding of gene content, 
functional relevance and genomic plasticity 
in natural microbial communities. We argue that 
DnaE2 is more prevalent among terrestrial than 
aquatic bacteria and the presence of DnaE2 
contributes considerably to the success of bacterial 
land colonization. To test this proposition, we 
collect six different metagenomic samples: three 
data sets generated from aquatic environment DNA 
and three from terrestrial DNA. Consistent with our 
expectations, a clear enrichment of DnaE2 was 
identified in terrestrial (~ 55-68% DnaE2-contain- 
ing bacteria) than in aquatic (only ~ 11-21% DnaE 2- 
containing bacteria) environments (Table 2). We also 
detect the proportion of DnaE 2 -containing bacteria 
in each of the 12 freshwater and 20 FACE soil 
samples. Intriguingly, we find that the sequencing 
depth (indicated by the abundance of DnaE poly- 
merases) correlates linearly with the proportion of 
DnaE2 (Figure 2). Specifically, soil samples present 
a strong upward trend, implying that the proportion 
of DnaE 2 -containing bacteria may constitute > 70% 
of total bacterial species at least in the FACE soil 
samples (i? = 0.81, P< 0.0001), whereas freshwater 
samples exhibit a strong descending trend, 
indicating that DnaE2-containing bacteria in this 
environment are clearly underrepresented (as low as 
-10%; R= -0.58, P<0.05). Taken together, these 
results clearly suggest that dnaE2 is one key 
soil-specific gene. In addition, we notice that 
the lower GC contents ( — 48%) of bacteria found in 
the upper level of the marine sediment (at depths of 
5 and 30 m) are associated with lower abundance of 



DnaE2 (on average, only 42.4% are DnaE 2 -contain- 
ing bacteria), as compared with the deep marine 
sediment below 50 m (at depths of 50, 70, 91 and 
159m) where higher GC contents ( — 53.5%) are 
found to be correlated with higher proportion of 
DnaE 2 -containing bacteria (84.5%) (Supplementary 
Table S2), confirming the major role of DnaE2 to 
bacterial GC increase. We can only roughly estimate 
the relative abundance of three mutator genes 
because of lack of benchmarking gene of similar 
length (Supplementary Table S3). The results indi- 
cate that in all three samples of aquatic 
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Figure 2 Correlation between DnaE2 (%) and sequencing depth. 
The total number of DnaE sequences (DnaE 1 + DnaE 2 + DnaE 3) 
and the proportion of DnaE 2 -containing bacteria (DnaE2%) are 
indicated in the x and y axes, respectively, and the former can be 
roughly used as a measure of sequencing depths. There are 20 soil 
(a) and 12 freshwater (b) samples recruited for this analysis. 



Table 2 Summary of DnaE and PolC polymerases (^100 amino acids) in six metagenomic samples 



Samples DnaEl + DnaE2 PolC unclassified Total DnaE2% 

DnaE3 



North Pacific Ocean 1414 307 

Sargasso sea 1353 283 

Fresh water 1 872 96 

Peru Margin sediment 3 92 52 

Minnesota Farm 66 45 

FACE C 4406 2450 



209 543 2473 21.71 

798 110 2544 20.92 

85 388 1441 11.01 

29 19 192 56.52 

5 40 156 68.18 

393 2034 9283 55.61 



The number of polymerases is listed in each group for each sample. 

a Twelve samples from fresh water are all recruited for further data saturation analysis in Figure 2. 

b Six samples of marine sediment of different depths are further compared in order to clarify the association between DnaE2% and GC content 
variations in Supplementary Table S2. 

"Twenty samples from FACE (Free-Air Carbon Dioxide Enrichment) sites are also recruited for further data saturation analysis in Figure 2. 
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environment, mutT is indeed more abundant than 
mutY/M. But in terrestrial environment, only 
sample from FACE site has a slightly higher 
enrichment of mutY/M than mutT, whereas the 
other two samples unexpectedly have more mutT 
genes. Currently, we are not sure whether this 
unexpected pattern in two soil samples is a result 
of lower level of sequencing depths or alternatively 
it stands for a common observation for genes that are 
playing subsidiary roles in altering genomic GC 
contents. 



horizontal gene transfer, for example, dnaE2 of 
bacteria in Planctomycetes and Bacteroidetes. How- 
ever, the most striking finding is that DnaE2 in 
terrabacteria (that is, Actinobacteria, Firmicutes, 
Chloroflexi and Deinococcus-Thermus) (Battistuzzi 
et ah, 2004; Battistuzzi and Hedges, 2009) are more 
closely related to DnaEl, implying that dnaE2 might 
first appear in terrabacteria, which further confirms 
our idea about the unrecognized outstanding contri- 
bution of DnaE2 to bacterial water-to-land transition. 



Evidence from phylogenetic analysis 
We also construct the phylogenetic tree of DnaE2 
polymerase (Figure 3) using 10 DnaEl sequences as 
an outgroup. We find that dnaE2 is often involved in 



Case studies 

Chlamydiae and Verrucomicrobia are known to be 
closely related (Wagner and Horn, 2006; Griffiths 
and Gupta, 2007), but it remains unclear why they 
have distinct ecological niches and genomic features 



i 1 

0.2 




(3 proteobacteria 

y proteobacteria 
Planctomycetes 



a proteobacteria 



Bacteroidetes 

8 proteobacteria 

Gemmatimonadetes 
Acidobacteria 
■ Verrucomicrobia 

Actinobacteria II 
Chloroflexi 



100 




100 




100 


— ^^^^^M^M Firmicutes 




100 


| dnaEl outgroup 



Deinococcus 
Actinobacteria I 



Figure 3 Phylogenetic tree of DnaE2. The tree is constructed by using MEGA 5.0 under JTT + T4 model (with 100 bootstraps). Ten 
randomly selected DnaEl sequences from Actinobacteria are used as outgroup (in gray color). Terrabacteria are colored in blue. 
Horizontally transferred DnaE2 sequences are colored in red. The main topology of DnaE2 tree is validated by a more comprehensive data 
set of outgroup using both neighbor- joining (NJ) and maximum likelihood (ML) methods (Supplementary Figure S2). 
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Chlamydia muridarum ■ *°3o 

Chlamydia trachomatis ■■ «•* 
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Verrucomicrobiae bacterium DG1235 

Opitutaceae bacterium TAV2 

Opitutus terrae PB90-1 

Pedosphaera parvula Ellin514 

Chthoniobacter flavus Ellin428 

Akkermansia muciniphila ATCC BAA-835 

Verrucomicrobium spinosum DSM 4136 

Tistrella mobilis KA081 020-065 

Thalassospira xiamenensis M-5 DSM 17429 

Thalassospira profundimaris WP021 1 

Caenispirillum salinarum AK4 

Rhodospirillum rubrum F11 

iRhodospirillum photometricum DSM 122 -- 
Magnetospirillum gryphiswaldense MSR-1 - 
Phaeospirillum molischianum DSM 120 — 

Magnetospirillum sp. SO-1 

Magnetospirillum magneticum AMB-1 

Magnetospirillum magnetotacticum MS-1 -- ■ 

Oceanibaculum indicum P24 

Candidatus Endolissoclinum patella L2 — | 34.10 

Rhodocista centenaria SW 

Azospirillum amazonense Y2 

Azospirillum brasilense Sp245 

Azospirillum sp. B510 

Azospirillum lipoferum 4B ■ 





Figure 4 The gain and loss of dnaE2 and its correlation with bacterial land colonization. Bacteria in Chlamydiae-Verrucomicrobia 
(a, with branches colored in magenta and light blue) and Rhodospirillaceae (b) are used for this case study. The phylogenetic trees are 
constructed by using MEGA 5 under JTT + T4 model (with 500 bootstraps). Red circles mapped on the trees are proportional to the 
genome size of each bacterium. Bacteria names labeled in brown color stand for DnaE2-containing bacteria and the green bars in the right 
panel are proportional to the GC content of each bacterium. Although the non- Azospirillum bacteria are correlated with aquatic 
environment, most of them actually dwell in the boundaries or mixtures of soil and water, such as water sediment (for example, 
M. magnetotacticum, M. gryphiswaldense and T. profundimaris) and ditch mud (for example, R. rubrum and P. molischianum). 
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(for example, GC contents, genome sizes). We thus 
use them for a further dissection of the enigmatic 
relationship among GC content variation, genome 
size expansion and ecological shifts. Our compara- 
tive analysis reveals that most of the Verrucomicro- 
bia bacteria have gained dnaE2 gene, presumably 
provoking higher GC content and better fitness in 
soil environment. In contrast, Chlamydiae, lacking 
dnaE2, are evolving toward a very different destiny, 
namely, they are involved in a series of ecological 
shifts (for example, from environment to animals or 
from animals to humans) and host-pathogen coevo- 
lution events (Horn et aL, 2004; Roulis et aL, 2012), 
followed by dramatic genome reduction and GC 
decrease (Figure 4a). Notably, there are only three 
bacteria [Methylacidiphilum infernorum V4, Akker- 
mansia muciniphila and Opitutaceae bacterium 
TAV2) in Verrucomicrobia found to have lost dnaE2. 
The absence of dnaE2 in the first two bacteria can be 
better indicated by their dramatic genome reduc- 
tions, and the loss of dnaE2 in Opitutaceae 
bacterium TAV2 (or Diplosphaera colitermitum 
TAV2) still needs further examination owing to its 
incomplete genome sequence. 



We also examine bacteria of the genus Azospir- 
illum that are reported to have transitioned from 
marine to terrestrial environments (Wisniewski-Dye 
et aL, 2011). Our results demonstrate that all the 
four soil-dwelling bacteria of this genus, having high 
GC contents and large genome sizes, are indeed 
DnaE 2 -containing bacteria (Figure 4b). However, 
we find it is very unconvincing that bacterial land 
colonization may begin from this genus 
(Wisniewski-Dye et aL, 2011), given the fact that 
Alpha-proteobacteria are abundant in soil. As in 
this case, we notice that most bacteria of the non- 
Azospirillum within the Rhodospirillaceae family 
have comparable high GC contents with that of the 
four bacteria in Azospirillum (except Candidatus 
Endolissoclinum patella L2 that has experienced 
striking genome reduction because of its ancient 
symbiotic relationship with marine tunicate; Kwan 
et aL, 2012). Therefore, we infer that dnaE2 should 
not only appear in Azospirillum but also be common 
among non- Azospirillum bacteria. Our genome 
screening indeed show that at least five of these 
non- Azospirillum bacteria also possess dnaE2, and 
thus we further wonder why these water-associated 
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non- Azospirillum bacteria (Wisniewski-Dye et al., 
2011) are DnaE 2 -containing bacteria if DnaE2 is the 
decisive element of bacterial land colonization. 
Our following ecological survey in the DSMZ 
(Deutsche Sammlung von Mikroorganismen 
und Zellkulturen GmbH) microbial collections 
(http://www.dsmz.de/catalogues/catalogue-micro 
organisms.html) indicates that although most of the 
iyoiy-Azo spirillum bacteria are associated with aqua- 
tic environments, they actually dwell in the bound- 
ary zones between water and soil environments, 
such as water sediment (for example, Magnetospir- 
illum magnetotacticum, M. gryphiswaldense 
and Thalassospira profundimaris) and ditch 
mud (for example, Rho do spirillum rubrum 
and Phaeospirillum molischianum). In addition, 
Tistrella bauzanensis that forms a coherent cluster 
with DnaE2-bearing T. mobilis is reported to be 
soil dwelling (Zhang et al., 2011). Therefore, 
we argue that not only the Azospirillum genus, but 
also bacteria of the entire Rhodospirillaceae family 
should have already begun their journey toward 
land colonization. 



Discussion 

The gain and loss of dnaE2 and its association with 
bacterial land colonization 

We have reported previously that ~68% of the 
terrestrial bacteria are DnaE 2 -containing bacteria 
(Wu et al., 2012). Here we combine evidence 
from taxonomic, metagenomic and phylogenetic 
analyses, revealing that the emergence of dnaE2 is 
a key genetic innovation underlying the success of 
bacterial land colonization. Given the strong corre- 
lation between dnaE2-b earing and soil-dwelling 
bacteria, we propose that bacterial land colonization 
is an ongoing process occurring successfully only 
when the bacterium acquires a dnaE2 gene and has 
nothing to do with taxonomic affiliations. This 
inference explains why the water-to-land transition 
of the Azospirillum genus (Wisniewski-Dye et ah, 
2011) has occurred much later than the suggested 
divergence of hydrobacteria and terrabacteria 
(Battistuzzi et al., 2004; Battistuzzi and Hedges, 
2009). The horizontal transfer of dnaE2 to Azospir- 
illum genus should have occurred much later, 
potentially consistent with the radiation of vascular 
plants on land as suggested by Wisniewski-Dye et al. 
(2011), whereas terrabacteria might be one of the 
first groups of bacteria that has gained dnaE2. Our 
results also provide insights into the strikingly 
different evolutionary scenarios of the two closely 
related groups — Chlamydiae and Verrucomicrobia. 
The ancestor of Verrucomicrobia has gained dnaE2 
followed by GC increase, genome expansion and 
land colonization, whereas the diiaE^-deficient 
early lineages of Chlamydiae have built an ancestral 
relationship with diverse hosts and experienced 
dramatic genomic reduction. Sphaerobacter 



thermophilus in Chloroflexi is also found to be very 
different from its close relatives by having higher GC 
content and living in terrestrial environment 
(originally isolated from sewage sludge; Pati et al, 
2010). Our result from genome analysis indicates 
that this bacterium also possess dnaE2. The only 
exception is Cyanobacteria that belong to terrabac- 
teria yet with no evidence of possessing dnaE2. 
However, we believe one of the earliest lineage of 
Cyanobacteria may once have had dnaE2 and 
conquered the land environment as evidenced by 
current phylogenetic analyses that tend to root 
Cyanobacteria at Gloeobacter violaceus (Nakamura 
et al., 2003), a high GC bacterium that prefers 
terrestrial environment (SANchez-Baracaldo et al., 
2005). That is to say, the current marine habitant 
of Cyanobacteria may actually be a back-to-the- 
sea event, which is in good agreement with 
previous studies (SANchez-Baracaldo et al., 2005; 
Wisniewski-Dye et al., 2011). 



GC increase vs genome expansion in relation to 
environmental adaptations 

We have provided evidence in our previous 
(Wu et al., 2012) and present studies to support 
the proposition that DnaE 2 plays the major role in a 
'dice-casting' of bacterial evolution, although deba- 
table in detailed molecular mechanisms of bacterial 
GC increase. We further argue that GC increase is the 
prerequisite of genome expansion based on the 
following grounds. First, it has long been known 
that GC content increase is linearly correlated 
with genome size expansion (Musto et al., 2006). 
Second, GC content is well recognized as one of the 
important barriers for horizontal gene transfers in 
recent studies as bacterial genie GC contents have 
not much deviated from the GC content of the main 
chromosomes (Popa et al, 2011; Nishida, 2012a, b; 
Hayek, 2013). Third, bacteria can selectively silence 
foreign genes whose GC content is lower than the 
host genomic GC content (Navarre et al., 2006; 
Navarre et al., 2007). Fourth, according to the 
phylogenetic analysis of dnaE2, this gene may first 
appear in terrabacteria that are estimated to have 
begun colonizing land as early as 3.54 to 2.83 Gyr 
(Battistuzzi and Hedges, 2009). Therefore, there is a 
possibility that the emergence of dnaE2 (and 
subsequent GC increase) has happened before the 
burst of de novo gene-family birth (~ 3.33-2.85 Gyr) 
or at least at the early stage of this 'Archaean genetic 
expansion' when extensive genome expansions 
have not been summoned (David and Aim, 2011). 
Taken together, the gain of exogenous environmental 
DNA/genes of high GC content rarely happens 
without an initial GC content increase of the host 
genome. 

Genome expansion stimulated by GC increase 
enables the success of bacterial land colonization by 
providing bacteria with higher rate of distant 
horizontal transfers from donors of similar genome 
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sizes (Cordero and Hogeweg, 2009) and with more 
genes involved in regulation, signaling or secondary 
metabolism (Cases et aL, 2003; Konstantinidis 
and Tiedje, 2004). Within this context, we can better 
understand why environment pressures are mis- 
leadingly reported to shape bacterial GC contents 
based on the finding that terrestrial bacteria gen- 
erally have higher GC content than aquatic bacteria 
(Foerstner et aL, 2005). In addition, we note that the 
RcGTA [Rhodobacter capsulatus gene transfer 
agent) (Lang and Beatty, 2007) that has been 
reported to be able to boost the efficiency of 
horizontal gene transfers (McDaniel et aL, 2010) is 
enriched in the soil than in the marine environ- 
ments (Supplementary Figure S3), providing addi- 
tional evidence to explain bacterial genome 
expansion in terrestrial environment. 

Based on the immense metabolic flexibility of 
high-GC DnaE2 -containing bacteria, we conclude 
that there must be ample unusual metabolic features 
because of their enhanced ability to gain exogenous 
genetic materials. To take Symbiobacterium thermo- 
philum as an example, this DnaE2 -containing bacter- 
ium is known to possess a variety of respiratory 
systems found only in Gram-negative bacteria (Ueda 
et aL, 2004); DnaE2-containing Silicibacter pomeroyi 
is also found to have some unusual metabolic 
pathways to deal with nutrient-poor habitats 
(Moran et aL, 2004); there is even one DnaE2- 
containing bacterium reported to have some eukar- 
yotic features, for example, genes involved in sterol 
synthesis (Pearson et aL, 2003). In addition, most 
bacteria armed with compound-degrading ability are 
also revealed to be DnaE2-containing bacteria (Phale 
et aL, 2007; Wu et aL, 2012), uncovering their great 
roles in bioremediation. Furthermore, not only 
unusual but also novel metabolic pathways tend to 
be found in DnaE2-containing bacteria (Table 3). 
Thus, it is not surprising to detect a new pathway for 
calcification in Cyanobacteria (Couradeau et aL, 
2012) as postulated by our inference that Cyanobac- 
teria may once have possessed dnaE2 and dwelt 
in terrestrial environment. Even genes involved in 
oxygenic photosynthesis that are often involved in 
horizontal gene transfer (Shi and Falkowski, 2008) 
may also originate during terrestrial adaptation 
(Battistuzzi et aL, 2004). 



A unified view of bacterial land colonization 
Our hypothesis, for a better display, is illustrated in 
Figure 5. The marine ancestral dnaEl -containing 
bacteria (step 1) gain an active copy of dnaE2 (evolved 
from a dnaEl duplicate) followed by GC increase 
(step 2), genome expansion and land colonization 
(step 3). Starting from step 3, there are three different 
possible evolutionary scenarios. First, some bacteria 
in this stage further experience niche-restricted 
adaptations because of ecological shifts (to mammals 
for example), lose dnaE2 (step 4) and continue to 
evolve generally in the form of GC decrease (step 5) 
and genome reduction (step 6) owing to host jump 
(for example, from environment to insects). Second, 
there may be also some that have experienced striking 
genome reduction but still possess dnaE2 and thus 
keep high GC content (step 7). Third, others possibly 
reinvade the marine environment (step 8 and 9) and 
further spread to marine-borne hosts (step 10). In 
addition, pathogenic/symbiotic bacteria in various 
land- and water- dwelling organisms may also evolve 
from lineages derived directly from ancestral dnaEl- 
containing bacteria. 

In summary, the bacterial water-to-land transition 
is characterized by a new pathway of adaptive 
mutagenesis that arms the bacteria with a genome 
of higher GC content, larger genome size and an 
open pan-genome so that they have better ability to 
deal with strange and hostile soil environments. 
This new pathway of adaptive mutagenesis or 
genetic innovation recruits the coding product of 
dnaE2 that may come from dnaEl gene duplication 
for error-prone DNA synthesis (observed as GC 
increase). The genomic GC increase shaped by 
DnaE2-invloved error-prone DNA repair is then 
inferred to be the prime for subsequent bacterial 
genome expansion that in return confers bacteria 
with a fitness advantage in the soil-based environ- 
ment. Driven by positive selection, dnaE2 passes 
through one bacterium to another (by horizontal 
gene transfer or recombination), sweeping through 
different bacterial phyla and triggering new ecolo- 
gical differentiations. This view is consistent with 
the model of 'ecotype-formation mutations' (Cohan 
and Perry, 2007). However, significant work remains 
to be done in order to reveal the detailed genetic 
basis of these adaptive evolutionary changes. 
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Table 3 Examples of new metabolic pathways identified in the DnaE 2 -containing bacteria 



DnaE2 bacteria 

Burkholderia cenocepacia 
Methylococcus capsulatus 



Ruegeria pomeroyi 
Thermomicrobium roseum 
Candidatus Methylomirabilis oxyfera 
Pseudomonas sp. strain MTl 
Gemmatimonas aurantiaca 



Phylum 

Actinobacteria 
a-Proteobacteria 



a-proteobacteria 
Chloroflexi 
Division NC10 
y-Proteobacteria 
Gemm a tim on adetes 



New pathways 
Anoxic persistence 

Gluconeogenesis; the ability to use copper in 
regulation of methanotrophy; sterol and hopanoid 
biosynthesis 

Assimilation of dimethylsulfoniopropionate 
Oxidization of CO aerobically 
Methane oxidation under anoxic conditions 
4- and 5-Chlorosalicylate degradation 
Synthesis of the carotenoids 



Reference 

Sass et al. (2013) 
Ward et al. (2004) 



Reisch et al. (2011) 
Wu et al. (2009) 
Ettwig et al. (2010) 
Nikodem et al. (2003) 
Takaichi et al. (2010) 
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(5) Slightly reduced genome size 
and GC content accompanied by 
dnoE2 loss, e.g., Shewanella 
baltica. 


(6) Most obligste psthogenic 
/symbiotic bacteria in 
animals and insects are of 
this group, holding low GC 
contents and small genome 
sizes, e.g., Mycobacterium 










(7) Dramatic genome reduction but 
still having dnaE2 and high GC 
content, e.g., Brucella melitensis, 
Thermus thermophilus HB8. 



(4) This group bacteria 
lost dnaE2 very recently 
thus still hold high GC 
content, e.g., 
Bifidobacterium genera, 
Akkermansia muciniphila. 



(3) Genome expansion and land 
colonization after gain of dnaE2, 
e.g., most Actinobacteria and 
Apha-proteobactera. 



(9) DnaE2 group bacteria reinvade marine 
environment followed by genome reduction 
and dnaE2 loss, e.g., Cyanobacteria. 



(8) DnaE2 group bacteria reinvade 
marine environment, experiencing 
mild genome reduction but still of 
dnaE2 and high GC content; e.g., 
Tistrella mobilis. 



Land I I GC content (low ~ hi«h) 

[ dnaEi | dnaEl gene I dnaE2 1 dnaE2 gene [ xxx | Intermediate form of dnaE2 loss 

Figure 5 A unified model of bacterial land colonization. We propose a conceptual framework here to help clarify the emergence of 
dnaE2 and its contribution to bacterial land colonization through a series of genomic changes including GC content increase and genome 
expansion. Marine-borne (for example, seaweed and fish) and land-borne organisms (for example, trees for plants, beetle for insects 
and cattle for mammals) are also illustrated here for exemplification of bacterial radiation because of host diversification. The 
c/nali-containing rectangles stand for different stages of bacterial evolution with the rectangle sizes proportional to bacterial genome 
sizes. There are mainly three routes for bacterial evolution. The first route is bacterial shifts from ocean to land owing to emergence of 
dnaE2, GC increase and genome expansion (black arrows) and further radiations and adaptations because of the explosion of diverse land 
hosts. The second route is the reinvasion of the marine bacteria after land colonization (black dashed arrows). The last route for bacterial 
host jumps is directly from oceanic to land-borne organisms (gray arrows). Key genomic innovations and bacterial examples are labeled 
for each stage (up to 10). 



Marine 



Further radiation of the soil microbial community 
Based on this conceptual framework, we also find 
clues on the radiation of soil-dwelling microbial 
communities to other land hosts. For instance, the 
smallest Beta-proteobacterial bacterium Candidatus 
Glomeribacter gigasporarum (~1815 genes), which 
has a surprisingly high GC content (54.8%) 
(Ghignone et ah, 2012), is found to maintain an 
ancient relationship with arbuscular mycorrhizal 
fungi (Jargeat et ah, 2004). We infer that this 
bacterium may once have possessed dnaE2 and 
experienced ecological shift from soil to fungi and 
dramatic genome reduction because most of its 
relatives are diia^-containing and soil-dwelling 
bacteria (Supplementary Figure S4). This argument 
is also supported by another closely related fungal 
endosymbiont, that is, Burkholderia rhizoxinica HKI 
454, that is still holding dnaE2 and thus higher GC 
content (60.7%) because of a more modest genome 
reduction (~3936 genes) compared with Candida- 
tus Glomeribacter gigasporarum. In addition, recent 
metagenomic studies (Cazemier et al., 1999; 



Cazemier et al, 2003; Ventura et ah, 2007; 
Kaltenpoth, 2009; Salem et al, 2012; Sudakaran 
et ah, 2012) have repeatedly revealed that some 
Actinobacteria play essential roles in insect gut by 
helping their insect hosts to use diverse plant-fiber- 
derived polysaccharides. As most Actinobacteria are 
DnaE 2 -containing and particularly widespread in 
the terrestrial environment, they are supposed to 
transit from soil to plant host and therefore are 
regularly encountered by soil-dwelling insects feed- 
ing on plants (Kaltenpoth, 2009). There may also be 
some Actinobacteria that have experienced soil-to- 
human host jump, for example, Turicella otitidis, a 
DnaE2-containing human pathogen in skin and ear 
that has a small genome (~ 1800 genes) but high GC 
content (-71%) (Brinkrolf et al, 2012). In addition, 
the shared antibiotic resistome of soil bacteria and 
human pathogens (Forsberg et al, 2012) can also 
support the pivotal roles of soil bacterial commu- 
nities to the bacterial radiation. Furthermore, 
we infer that it is an easier shift for bacteria to move 
from soil to inland fresh water than from soil back to 
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marine as evidenced by limited overlaps between 
freshwater and marine microbial communities (for 
example, soil abundant Beta-proteobacteria are also 
typical freshwater goers yet nearly completely 
absent in the oceans) (Philippot et al., 2010). 



Conclusion 

Here we perform comprehensive analyses and try to 
put together a unified view on bacterial land 
colonization that has been proposed to involve gene 
duplication, function diversification and a series of 
genomic alternations that include GC content 
increase and genome expansion. We report that 
the emergence and the sweep of dnaE2 are the key 
genomic innovations underlying bacterial adapta- 
tion to terrestrial environment based on three lines 
of evidence. First, the similar taxonomic structure 
between diia^-containing and soil-dwelling bac- 
teria implies that dnaE2 plays a decisive role in 
shaping the unique biogeographic pattern of soil 
microbial community. Second, metagenomic data 
screening reveal that dnaE2 is indeed soil specific. 
Third, phylogenetic analyses indicate that dnaE2 
may first appear in terrabacteria. Taken together, 
these results consistently and clearly show that 
dnaE2 is of great relevance to the success of 
terrabacterial land colonization, providing a new 
perspective for the study of bacterial radiation after 
land colonization. Future studies will be focused on 
experimental validation of this genome-based 
hypothesis. 
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